%%html
<script src="http://bits.csb.pitt.edu/asker.js/lib/asker.js"></script>
<script>
$3Dmolpromise = new Promise((resolve, reject) => {
require(['https://3dmol.org/build/3Dmol-nojquery.js'], function(){
resolve();});
});
require(['https://cdnjs.cloudflare.com/ajax/libs/Chart.js/2.2.2/Chart.js'], function(Ch){
Chart = Ch;
});
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
//the callback is provided a canvas object and data
var chartmaker = function(canvas, labels, data) {
var ctx = $(canvas).get(0).getContext("2d");
var dataset = {labels: labels,
datasets:[{
data: data,
backgroundColor: "rgba(150,64,150,0.5)",
fillColor: "rgba(150,64,150,0.8)",
}]};
var myBarChart = new Chart(ctx,{type:'bar',data:dataset,options:{legend: {display:false},
scales: {
yAxes: [{
ticks: {
min: 0,
}
}]}}});
};
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
%%html
<div id="secstruct" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#secstruct';
jQuery(divid).asker({
id: divid,
question: "Which is not an example of protein secondary structure??",
answers: ["Twist","Helix","Sheet","Coil"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
ProDy stands for Protein Dynamics,
ProDy is designed for normal mode analysis, but also is
This lecture contains:
In interactive sessions, the following is safe (no name conflicts)
from prody import *
checkUpdates()
When developing software, prefer importing specific functions/classes or prefixing
import prody as pd
from prody import parsePDB
ProDy functions start with an action verb followed by one or more names or an abbreviation/extension, i.e. doSomething:
parseEXT(): parse a file in EXT format, e.g. parsePDB, parseDCDwriteEXT(): write a file in EXT format, e.g. writePDB, writeDCDfetchSTH(): download a file, e.g. fetchPDB, fetchMSAcalcSTH(): calculate something, e.g. calcRMSD, calcGyradius, calcANMshowSTH(): show a plotting of something, e.g. showCrossCorrelations, showProteinsaveSTH(): save a ProDy object instance to disk, e.g. saveAtomsloadSTH(): save a ProDy object instance to disk, e.g. loadAtomsClass names start with upper case letters and are abbreviations or words in camel case style, e.g. AtomGroup, Selection, ANM, etc.
Class method naming is similar to function naming:
Class.numSomething: return number of the thing, e.g. AtomGroup.numAtoms returns number of atomsClass.getSomething: return an attribute, e.g. AtomGroup.getNames returns atom namesClass.setSomething: set/update an attribute, e.g. AtomGroup.setNames sets atom namesClass.buildMatrix: builds a matrix, e.g. PCA.buildCovariance calculates covariance matrixubi = parsePDB('1ubi')
ubi
<AtomGroup: 1ubi (683 atoms)>
TIP: to see a list of available functions, just type the action word (calc, show, get, etc.), and use TAB completion.
ubi.numAtoms()
683
calcGyradius(ubi)
12.085104173005442
%matplotlib inline
showProtein(ubi)
<mpl_toolkits.mplot3d.axes3d.Axes3D at 0x12448e190>
import py3Dmol
showProtein(ubi)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x12455ad00>
%%html
<div id="atomcnt" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#atomcnt';
jQuery(divid).asker({
id: divid,
question: "How many atoms does PDB 3ERK have?",
answers: ["1023","2849","3016","4872"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
ProDy has a custom format for directly saving atom groups.
saveAtoms(ubi)
'1ubi.ag.npz'
Can also read/write standard protein formats (PDB,PQR)
writePDB('ubi.pdb', ubi)
'ubi.pdb'
parsePDB(writePDB('ubi.pdb', ubi))
<AtomGroup: ubi (683 atoms)>
AtomGroups¶Protein Data Bank (PDB) Structure files can be parsed using parsePDB. For a given PDB identifier, this function will download the file if needed.
ag = parsePDB('1vrt')
Structure data parsed from the file is stored in an AtomGroup instance:
type(ag)
prody.atomic.atomgroup.AtomGroup
Why AtomGroup?
Molecule, because structures are usually made up from multiple moleculesStructure, because PDB format is sometimes used for storing small-moleculesAtomGroup made sense for handling bunch of atoms, and is used by some other packages too.
AtomGroup methods¶Check number of atoms and models (coordinate sets)
ag.numAtoms()
7953
ag.numCoordsets()
1
Use getSomething methods to access data parsed from the file
names = ag.getNames()
names
array(['N', 'CA', 'C', ..., 'O', 'O', 'O'], dtype='<U6')
len(names)
7953
print(names[0])
N
ag.getBetas()
array([86.04, 85.23, 84.2 , ..., 64.55, 58.23, 46.98])
Note that get is followed by a plural name, and method returns a numpy array
Use setSomething methods to set attributes, but don't forget to pass an array or list with length equal to number of atoms
zeros = [0] * len(ag) # same as ag.numAtoms()
ag.setBetas(zeros)
ag.getBetas()
array([0., 0., 0., ..., 0., 0., 0.])
Atom instances¶You can get a handle for specific atoms by indexing
a0 = ag[0]
print(a0)
Atom N (index 0)
Note that getSomething and setSomething methods are available, but that thing is in singular form:
a0.getName()
'N'
a0.getBeta() # we had just set it to zero
0.0
Slicing an AtomGroup will return a handler called Selection for specified subset of atoms. Let's get every other atom:
eoa = ag[::2] # eoa: every other atom
print(eoa)
Selection 'index 0:7953:2'
print(len(eoa))
print(len(ag))
3977 7953
get and set methods in plural form are also available for Selection instances
eoa.getNames()
array(['N', 'C', 'CB', ..., 'O', 'O', 'O'], dtype='<U6')
Macromolecules have a hierarchical structure:
ag.numChains()
2
ag.numResidues()
1233
showProtein(ag)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x124631550>
for ch in ag.iterChains():
print('%s - %d residues' % (str(ch), ch.numResidues()))
Chain A - 688 residues Chain B - 545 residues
for ch in ag.iterChains():
print(ch)
for res in ch:
print(' |-%s' % res)
break
print('...')
Chain A |-PRO 4 ... Chain B |-ILE 5 ...
Also iterAtoms (default), iterBonds, iterCoordsets, iterFragments, iterResidues, iterSegments
Index with chain identifier to get a chain instance:
chA = ag['A']
print(chA)
type(chA)
Chain A
prody.atomic.chain.Chain
Index with a pair of chain identifier and residue number to get a handle for a residue:
chA_res10 = ag['A', 10]
print(chA_res10)
type(chA_res10)
VAL 10
prody.atomic.residue.Residue
Of course, plural forms of get and set methods and other methods are available:
chA_res10.getNames()
array(['N', 'CA', 'C', 'O', 'CB', 'CG1', 'CG2'], dtype='<U6')
chA_res10.numAtoms()
7
%%html
<div id="resname" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#resname';
jQuery(divid).asker({
id: divid,
question: "What is residue 12 of 3ERK?",
answers: ["VAL","ALA","PHE","ARG"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
The atom selection engine is what makes ProDy a powerful tool. Selection grammar is similar to that of VMD but with added flexibility of Python. AtomGroup.select method accepts selection arguments and returns a Selection instance:
sel = ag.select('protein and name CA')
print(sel)
print('# of atoms: %d' % sel.numAtoms())
set(sel.getNames())
Selection 'protein and name CA' # of atoms: 925
{'CA'}
Same as using the keyword 'ca' or 'calpha'
ag.select('ca') == ag.select('calpha') == ag.select('protein and name CA')
True
You can make proximity based selections:
kinase = parsePDB('3erk')
bindingsite = kinase.select('within 5 of (hetero and not water)')
print('# of atoms: %d' % bindingsite.numAtoms())
print(set(bindingsite.getResnames()))
# of atoms: 98
{'LYS', 'SER', 'ASP', 'LEU', 'GLN', 'SB4', 'MET', 'VAL', 'ALA', 'HOH', 'CYS', 'ILE'}
exwithin excludes the initial selection
noligand = kinase.select('exwithin 5 of (hetero and not water)')
print('# of atoms: %d' % noligand.numAtoms())
print(set(noligand.getResnames()))
# of atoms: 73
{'LYS', 'SER', 'ASP', 'LEU', 'GLN', 'MET', 'VAL', 'ALA', 'HOH', 'CYS', 'ILE'}
%%html
<div id="withincnt" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="http://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#withincnt';
jQuery(divid).asker({
id: divid,
question: "How many non-water protein atoms are there within 5A of SB4?",
answers: ["67","70","73","78"],
server: "http://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
Words used within the selection string that are not reserved keywords can be substituted with keyword arguments.
Select atoms that are close to a point in space:
import numpy as np
origin = np.zeros(3)
print(origin)
sel = ag.select('within 10 of origin', origin=origin)
sel
[0. 0. 0.]
<Selection: 'index 3426 to 3437 3439 to 3461' from 1vrt (35 atoms)>
calcDistance(sel, origin)[:5]
array([9.93856378, 9.16952812, 9.08634382, 9.79169985, 8.45576029])
ag.select('within 5 of center',center = calcCenter(ag))
<Selection: 'index 4457 to 4...49 to 7353 7941' from 1vrt (19 atoms)>
Dot operator can also be used to make selections:
ag.calpha
<Selection: 'calpha' from 1vrt (925 atoms)>
ag.name_CA_and_resname_ALA
<Selection: 'name CA and resname ALA' from 1vrt (37 atoms)>
See full documentation of selection grammar at: http://prody.csb.pitt.edu/manual/reference/atomic/select.html
AtomGroup can handle multiple models in a PDB file. Models are called coordinate sets.
ubi = parsePDB('2k39')
ubi.numCoordsets()
116
Active coordinate set (ACS) can be queried or changed using getACSIndex and setACSIndex methods:
ubi.getACSIndex()
0
Coordinates are numpy arrays
coords = ubi.getCoords()
coords.mean(axis=0)
array([26.02865475, 25.60081316, 20.56833875])
ubi.setACSIndex(115)
ubi.getCoords().mean(axis=0)
array([26.49875061, 26.09516897, 20.5057498 ])
Selections have their own active coordinate set indices:
ubi_ca = ubi.calpha
ubi_ca.getACSIndex()
115
ubi_ca.setACSIndex(0)
print(ubi_ca.getACSIndex())
print(ubi.getACSIndex())
0 115
Call copy method to make a copy of AtomGroup or Selection instance
ag.copy()
<AtomGroup: 1vrt (7953 atoms)>
ca = ag.select('ca')
ca.copy()
<AtomGroup: 1vrt Selection 'ca' (925 atoms)>
AtomGroup addition¶Lot's of customization has gone into ProDy classess handling atoms. Addition of AtomGoups (__add__) results in a new AtomGroup:
ag_copy = ag.copy()
moveAtoms(ag_copy, by=np.array([50, 50, 50]))
ag_copy['A'].setChid('C')
ag_copy['B'].setChid('D')
ag_new = ag + ag_copy
showProtein(ag_new)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x124730430>
AtomGroup membership¶You can use in to see whether a Selection is in another one (enabled by implementing Selection.__contains__() method):
ag.calpha in ag.backbone
True
ag.backbone in ag.protein
True
ag.water in ag.protein
False
ag.water in ag
True
Selection operations¶You can use bitwise operations on selections as follows:
chA = ag.chain_A
chB = ag.chain_B
print(chA & chB) # intersection
None
sel = chA | chB # union
sel.getSelstr()
'(chain A) or (chain B)'
~chA #not
<Selection: 'not (chain A)' from 1vrt (3461 atoms)>
AtomGroup¶You can store arbitrary atomic data in AtomGroup class and make it persistent on disk:
atoms = parsePDB('1p38')
atoms.getResnums()
array([ 4, 4, 4, ..., 771, 773, 776])
resnum_fract = atoms.getResnums() / 10.
resnum_fract
array([ 0.4, 0.4, 0.4, ..., 77.1, 77.3, 77.6])
atoms.setData('resnumfract', resnum_fract)
atoms.getData('resnumfract')
array([ 0.4, 0.4, 0.4, ..., 77.1, 77.3, 77.6])
saveAtoms(atoms)
'1p38.ag.npz'
atoms = loadAtoms('1p38.ag.npz')
atoms.getData('resnumfract')
array([ 0.4, 0.4, 0.4, ..., 77.1, 77.3, 77.6])
Can request multiple files at once. Will download to local directory.
fetchPDB('1p38', '1r39', '@!~#') # searches working directory, local resources, then downloads
['1p38.pdb.gz', '1r39.pdb.gz', None]
fetchPDBviaHTTP('1p38')
fetchPDBviaFTP('1p38')
p38 = parsePDB('1p38')
bound = parsePDB('1zz2')
showProtein(p38, bound)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x1248422e0>
matchChains tries to match chains by sequence. Returns a list of all pairwise matches
apo_chA, bnd_chA, seqid, overlap = matchChains(p38, bound)[0]
print(bnd_chA)
print(seqid)
print(overlap)
AtomMap Chain A from 1zz2 -> Chain A from 1p38 99.40652818991099 96.01139601139602
Match is alpha carbon only
print(len(apo_chA),len(bnd_chA),len(p38),len(bound))
337 337 2962 2872
set(apo_chA.getNames())
{'CA'}
calcRMSD(apo_chA, bnd_chA)
72.93023086946586
bnd_chA, transformation = superpose(bnd_chA, apo_chA)
calcRMSD(bnd_chA, apo_chA)
1.8628014908695503
showProtein(p38, bound)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x1246a2b80>
PDB files often contain monomer coordinates, when the structure of the multimer can be obtained using symmetry operations. Let’s use some ProDy function to build tetrameric form of KcsA potassium channel:
# parse the monomer structure and PDB file header information
# the header includes transformations for forming tetramer
monomer, header = parsePDB('1k4c', header=True)
monomer
<AtomGroup: 1k4c (4534 atoms)>
showProtein(monomer, legend=True)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x124618d30>
without_K = monomer.not_name_K
without_K
<Selection: 'not name K' from 1k4c (4527 atoms)>
tetramer = buildBiomolecules(header, without_K)
tetramer
<AtomGroup: 1k4c Selection 'not name K' biomolecule 1 (18108 atoms)>
showProtein(tetramer)
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
<py3Dmol.view at 0x1246182b0>
Write a script, align.py, that takes an arbitrary number of PDB codes as arguments, downloads the files, aligns the structures to the first structure, then writes out the aligned structures as PDB files.
The following code is buggy. What are the problems?
pdbs = ['3erk','4zzn','4s32','3qyz'] #this will substitute for sys.argv[1:]
mols = [fetchPDB(pdb) for pdb in pdbs]
for m in mols:
m0, mi = matchChains(mols[0], m)
superpose(mi, m0)
for (n,m) in zip(pdbs,mols):
writePDB(n+'.pdb',m)
showProtein(*mols)